The present disclosure relates to a technology to synthesize a sound.
A fragment connection type sound synthesizing technology has conventionally been proposed in which the duration and the utterance content (for example, lyrics) are specified for each unit of synthesis such as a musical note (hereinafter, referred to as “unit sound”) and a plurality of sound fragments corresponding to the utterance content of each unit sound are interconnected to thereby generate a desired synthesized sound. According to JP-B-4265501, a sound fragment corresponding to a vowel phoneme among a plurality of phonemes corresponding to the utterance content of each unit sound is prolonged, whereby a synthesized sound which is the utterance content of each unit sound uttered over a desired duration can be generated.
There are cases where, for example, a polyphthong (a diphthong, a triphthong) consisting of a plurality of vowels coupled together is specified as the utterance content of one unit sound. As a configuration for ensuring a sufficient duration with respect to one unit sound for which a polyphthong is specified as mentioned above, for example, a configuration is considered in which the sound fragment of the first one vowel of the polyphthong is prolonged. However, with the configuration in which the object to be prolonged is fixed to the first vowel of the unit sound, there is a problem in that synthesized sounds that can be generated are limited. For example, assuming a case where an utterance content “fight” (one syllable) containing a polyphthong where a vowel phoneme /a/ and a vowel phoneme /l/ are continuous in one syllable is specified as one unit sound, although a synthesized sound “[fa:lt]” where the first phoneme /a/ of the polyphthong is prolonged can be generated, a synthesized sound “[fal:t]” where the rear phoneme /l/ is prolonged cannot be generated (the symbol “:” means prolonged sound). While a case of a polyphthong is shown as an example in the above description, when a plurality of phonemes are continuous in one syllable, a similar problem can occur irrespective of whether they are vowels or consonants. In view of the above circumstances, an object of the present disclosure is to generate a variety of synthesized sounds by easing such restriction when sound fragments are prolonged.